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ABSTRACT 



A key issue when designing and using DNA-targeting 
nucleases is specificity. Ideally, an optimal DNA- 
targeting tool has only one recognition site within a 
genomic sequence. In practice, however, almost all 
designer nucleases available today can accommo- 
date one to several mutations within their target 
site. The ability to predict the specificity of targeting 
is thus highly desirable. Here, we describe the first 
comprehensive experimental study focused on the 
specificity of the four commonly used repeat 
variable diresidues (RVDs; NI:A, HD:C, NN:G and 
NG:T) incorporated in transcription activator-like 
effector nucleases (TALEN). The analysis of >15500 
unique TALEN/DNA cleavage profiles allowed us to 
monitor the specificity gradient of the RVDs along 
a TALEN/DNA binding array and to present a 
specificity scoring matrix for RVD/nucleotide associ- 
ation. Furthermore, we report that TALEN can 
only accommodate a relatively small number of 
position-dependent mismatches while maintaining a 
detectable activity at endogenous loci in vivo, 
demonstrating the high specificity of these molecular 
tools. We thus envision that the results we provide 
will allow for more deliberate choices of DNA binding 
arrays and/or DNA targets, extending our engineer- 
ing capabilities. 



INTRODUCTION 

The DNA-binding domain derived from transcription 
activator-like effectors (TALE) has emerged in the past 
few years as a scaffold of choice to develop tailor-made 
DNA-binding fusion proteins (1-3). The sequence specifi- 
city of this family of proteins, involved in the natural in- 
fection process of the plant pathogens of the Xantlionwnas 
genus, is driven by a domain composed of repeated motifs 
of 33-35 amino acids. The specificity results from two 
polymorphic amino acids, the so-called repeat variable 
diresidues (RVDs) (4,5), located at positions 12 and 13 
of a repeated unit. The recent achievement of the high- 
resolution structure of TALEs bound to DNA confirmed 
that each single base of the same strand of the DNA target 
is contacted by a single repeated unit in a 5'-3' direction 
(in fine with the protein N-terminal to C-terminal). These 
structural studies also pointed out that the amino acid at 
position 13 contacts, in the major groove, the top DNA 
strand base, whereas the amino acid at position 12 par- 
ticipates in the stabilization of the repeated units (4,5). In 
addition to the central core mediating the sequence- 
specific DNA interaction, natural TALEs are composed 
of two additional domains. The N-terminal translocation 
domain is responsible for the preferential requirement of a 
first thymine base (the so-called Tq) in the targeted 
sequence, and the C-terminal domain contains nuclear lo- 
calization signals (NLS) and a transcriptional activation 
domain. 

By analyzing sequences of known TALEs and corres- 
ponding DNA targets in rice promoters, two groups 
identified a code governing the preferential pairing of 
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RVDs with DNA bases (6,7). With this straightforward 
one-to-one RVD/nucleotide association code (NLA, 
HD:C, NN:G and NG:T), TALE DNA-binding 
domains have rapidly become a promising platform for 
the creation of molecular tools such as nucleases 
(TALEN) (8-11), recombinases (12), transcription activa- 
tors (13-17) and repressors (18,19). Despite the increasing 
number of publications reporting array assembly methods 
and successes in targeting the desired endogenous DNA 
sequences, httle is known on the intrinsic RVD specificity 
along TALE DNA binding arrays (18,20-22). 

Defining precisely TALEN specificity and being able to 
predict potential off-site targeting is of crucial importance 
to fully assess the potential of this technology, notably for 
therapeutic applications. In this study, we report an exten- 
sive analysis of the degeneracy of the RVD/nucleotide as- 
sociations in the context of TALEN. We focus on the four 
commonly used conventional RVDs (NL HD, NN and 
NG) incorporated in >350 TALE DNA binding arrays. 
We developed two model systems, in yeast (extrachromo- 
somal, high throughput) and mammahan cells 
(intrachromosomal, medium throughput), that allow com- 
prehensive studies of specificity and activity of TALEN at 
either the levels of single RVD/nucleotide association or 
the complete array. The analysis of the nuclease profiles 
resulting from > 15 500 TALEN/DNA pairs in yeast 
allowed us to define an experimental model of the specifi- 
city of RVD/nucleotide associations, further vahdated to 
score the outcome of TALEN/targets mismatches in mam- 
malian cells. In addition, the minimum number of 
mismatches required within a TALEN/target pair to sig- 
nificantly abohsh activity at endogenous loci in vivo was 
determined. Consequently, our results set the stage for a 
more rational design of TALEN and contribute to a better 
understanding of the impact of the number, positions and 
types of mismatches within a TALEN/DNA binding array 
and DNA target. 

MATERIALS AND METHODS 

TALE arrays 

All TALE or TALEN arrays were obtained from Cellectis 
Bioresearch (Paris, France). TALEN^"^ is a trademark 
owned by Cellectis Bioresearch. Sequences of TALEN 
backbones, TALEN RVD array composition and/or 
relevant targets are presented in Supplementary Tables 
SI, S2 and S5-S8. 

Extrachromosomal SSA assay in yeast 

Mutants (TALEN-containing yeast strain) were gridded 
at high gridding density (~20 spots/cm^) on nylon filters 
placed on solid agar-containing YP-glycerol plates, using 
a colony gridder (Qpixll, Genetix). A second layer, con- 
sisting of reporter-harboring yeast strains, was gridded on 
the same filter for each target. Membranes were incubated 
overnight at 30° C to allow mating. To select diploids, 
filters were then placed and incubated for 2 days at 30° C 
on a medium lacking leucine (for the mutant) and trypto- 
phan (for the target) with glucose (2%) as the carbon 
source. To induce the expression of the TALEN, filters 



were transferred on YP-galactose-rich medium for 
24^8 h at 30 or 37°C. To monitor TALEN activity, 
through the P-galactosidase activity, filters were finally 
placed on solid agarose medium containing 0.02% 
X-Gal in 0.5 M sodium phosphate buffer, pH 7.0, 0.1% 
SDS, 6% dimethyl formamide, 7mM P-mercaptoethanol 
and 1% agarose and incubated at 37°C for up to 48 h. 

Filters were scanned and each spot was quantified using 
the median values of the pixels constituting the spot. We 
attribute the arbitrary values 0 and 1 to white and dark 
pixels, respectively. p-Galactosidase activity is directly 
associated with the efficiency of homologous recombin- 
ation, thus with the cleavage efficiency of the TALEN. 
Any value >0 is considered as the consequence of 
cleavage. For all our large-scale analyses, we considered 
a robust nuclease activity when above a threshold (t) of 
activity equal to 0.45. This value corresponds to the mean 
values of negative controls (m) plus three times the 
standard deviation (s) (t = m + 3s). 

Endogenous green fluorescent protein activity assay 

CHO-Kl (CGPS-CHOKl, Cellectis Bioresearch) cells 
containing the chromosomally integrated green fluores- 
cent protein (GFP) reporter gene including the TALEN 
recognition sequence (TGAACCGCATCGAGCTG 
aagggcatcgacTTCAAGGAGGACGGCAA) were 
cultured at 37°C with 5% CO2 in a F12-K complete 
medium supplemented with 2 mM 1-glutamine, penicillin 
(lOOIU/ml), streptomycin (100[ig/ml), amphotericin B 
(Fongizone: 0.25 |.ig/ml. Life Technologies) and 10% 
fetal bovine serum (FBS). Cell transfection was performed 
according to the manufacturer's instructions using the 
Nucleofector apparatus (Amaxa, Lonza). Adherent 
CHO-Kl cells were harvested at Day 1 of culture, 
washed with phosphate-buffered saline (PBS), trypsinized 
and resuspended in T nucleofection solution to a concen- 
tration of 1 X 1 0'' cells/ 1 00 [il. Subsequently, 5|ig of each 
of the two TALEN expression vector pairs (10|.ig final 
DNA amount) was mixed with 0.1 ml of the CHO-Kl 
cell suspension, transferred to a 2.0-mm electroporation 
cuvette and nucleofected using program U_023 of the 
Amaxa Nucleofector apparatus. Maximum 20min after 
nucleofection, 0.5 ml of prewarmed F12-K medium was 
added to the electroporation cuvette. Cells were then 
transferred to a Petri dish containing 10 ml F12-K 
medium and cultured at 37°C with 5% CO2, as previously 
described. On Day 3 post-transfection, cells were washed 
with PBS, trypsinized, resuspended in 5 ml and the per- 
centage of GFP-negative cells was monitored by flow 
cytometry (Guava EasyCyte, Merck Millipore). 

Extrachromosomal assay in CHO-Kl cells 

Activity in CHO-Kl ceUs was measured as previously 
reported by Valton et al. (23). In brief, cells were trans- 
fected (polyfect, Qiagen) with the two TALEN expression 
vectors and the reporter plasmid. Three days post-trans- 
fection, P-galactosidase was quantified at 420 nm using 
ONPG in a Hquid assay. The entire process was performed 
using a 96-well plate format on an automated Velocity 11 
BioCel platform. 
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Endogenous targeted mutagenesis 

Moreover, 293H cells were cultured at 37°C with 5% CO2 
in Dulbecco's modified Eagle's medium (DMEM) 
complete medium supplemented with 2mM 1-glutaniine, 
penicillin (lOOIU/ml), streptomycin (100 |.ig/nil), ampho- 
tericin B (Fongizone: 0.25 ^g/ml, Life Technologies) and 
10% FBS. Adherent 293H cells were seeded at 1.2 x 10*^ 
cells in 10 cm Petri dishes a day before transfection. Cell 
transfection was performed using the Lipofectamine 2000 
reagent according to the manufacturer's instructions 
(Invitrogen). Furthermore, 2.5 |ig of each of the two 
TALEN nuclease expression plasmids and 10 ng of GFP 
expression vector (5 ^g final DNA amount) were mixed 
with 0.3 ml of DMEM without FBS. In another tube, 
25|il of Lipofectamine were mixed with 0.3 ml of 
DMEM without FBS. After 5 min incubation, both 
DNA and Lipofectamine mixes were combined and 
incubated for 25 min at RT. The mixture was transferred 
to a Petri dish containing the 293H cells in 9 ml of 
complete medium and then cultured at 37°C with 5% 
CO2. Three days post-transfection, the cells were washed 
with PBS, trypsinized, resuspended in 5 ml complete 
medium and the percentage of GFP-positive cells was 
measured by flow cytometry (Guava EasyCyte) to 
monitor transfection efficacy. Cells were pelleted by cen- 
trifugation and genomic DNA was extracted using 
DNeasy Blood & Tissue Kit (Qiagen), according to 
the manufacturer's instructions. Polymerase chain reac- 
tion of the endogenous loci was performed using the oligo- 
nucleotide sequences presented in Supplementary Table S6 
and purified using the AMPure kit (Invitrogen). 
Amplicons were further analyzed by deep sequencing 
using the 454 system (Roche). 

TALE protein expression and purification 

The TALE IL2RG was purified as previously described for 
TALE AvrBs3 (24). The clarified lysate of the cells 
overexpressing TALE IL2RG was loaded onto a Ni-NTA 
(GE-Healthcare) column and eluted with a linear gradient to 
500 mM imidazole. Fractions containing the protein were 
loaded onto a heparin column and eluted by a linear salt 
gradient to 1 M NaCl. The protein was then loaded onto a 
Superdex 200 (GE-Healthcare) gel filtration column. 

Fluorescence anisotropy 

The dissociation constants between the TALE IL2RG 
protein and dsDNAs were estimated from the change in 
fluorescent polarization of complexes between protein and 
6-FAM-labeled dsDNAs. The 20-bp DNAs for this assay 
were annealed by slow cooling in 25 mM Hepes (pH 8.0) 
and 150 mM NaCl at a final duplex concentration of 500 nM 
The optimal concentration of 6-FAM-DNAs for the 
assay was empirically determined by measuring the fluor- 
escence polarization of serially diluted 6-FAM-DNAs 
samples (24). The concentration of the 6-FAM DNAs 
ranged between 20 and 40 nM and that of the TALE 
IL2RG protein ranged between 0 and 1000 nM. Both 
proteins and dsDNAs were dialyzed in buffer containing 
25 mM Hepes (pH 8), 150mM NaCl and 0.2 mM TCEP. 



After incubation at 25°C for 10 min, the fluorescence po- 
larization was measured in a black 96-well assay plate with 
Wallac Victor2V 1420 multilabel counter (PerkinElmer). 
The fitting of the data and the Kd calculations were done 
as previously described in (24). 

Data analysis 

Context dependence was analyzed taking into account 
only TALEN activity on their cognate targets. For each 
nucleotide pair (Nj, Nj), we studied how the presence of Nj 
just on the right (or on the left) of Nj was influencing 
activity. For all targets containing the subsequence NjNj 
(or NjNi), we computed the ratio R(Ni,Nj) = A(Ni,Nj)/ 
<A(Nk,Nj)> where A(Ni,Nj) is the activity on the target 
and <A(Nk,Nj)> is the average activity on all targets 
where Ni is replaced by any of the four nucleotides. If 
there is complete context independence, R(Ni,Nj) should 
not depend on Nj, and for a given N;, all values for the 
four Nj nucleotides should be the same. The average value 
and standard deviation of R(Ni,Nj), Nj is indicated on the 
axis, and Nj represented by various colors is shown in 
(Supplementary Figure S7A and B). 

To compute specificity matrices, we took all the 
activities of TALEN on targets T^ differing from one 
mutation compared with their cognate target T, and 
computed the drop of activity R = A(Ti.n)/A(T), where 
A(Tm) and A(T) are the activities on mutated and 
cognate targets, respectively. By definition, R is equal to 
1 for the cognate nucleotide corresponding to the code. 
The specificity matrix was computed by calculating the 
average of these R values for all the available pairs for 
each given position and represented by gray levels 
(black = 1, white = 0). An overall specificity matrix was 
computed from positions 1 to 7 by averaging them all. 

We predicted the value on mutated targets using the 
overall specificity matrix by taking the value Vo of the 
TALEN on its cognate targets and multiplying it by the 
product of the square roots of the matrix coefficients 
C(RVDi,Ni) corresponding to all RVD/nucleotides pairs, 
giving a final value V = Vq x Hi sqrt [C(RVDi,Ni)]. The 
square root was taken because the specificity matrix was 
obtained from symmetrically mutated targets, with one 
mismatch on one side corresponding, in fact, to two mu- 
tations on the target. 

To determine the probabihty to find off-site targets, we 
randomly drew 1 5 000 sequences of 200 bp each from the 
human genome and randomly selected a potential 
TALEN site available on each. For each potential 
TALEN site, off-site targets were determined as all se- 
quences having binding site pairs diverging from those 
of the TALEN site from three mismatches or less and 
having a spacer length ranging from 9 to 30 bp. All com- 
binations (left + right, left + left, right + right) were taken 
into account. 

RESULTS 

Experimental setup for the mammalian activity screening 

To investigate the specificity of the NLA, HD:C, NN:G 
and NG:T RVD/nucleotide pairing, we first sought to 
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design an experimental setup that will focus on the enzym- 
atic activity of the nuclease while minimizing or 
normalizing variations due to interfering parameters 
such as chromatin accessibihty and epigenetic modifica- 
tion. Toward this goal, we developed and used a model 
CHO-Kl cell hne containing a unique integrated GFP 
reporter gene. Small deletions or insertions (indels) 
produced by the non-homologous end-joining (NHEJ) 
repair pathway at the double-strand break (DSB) site 
(generated by the TALEN) led to the gene knockout, 
and thus to a GFP-negative phenotype. Therefore, we 
monitored the extinction of GFP, due to the activity of 
a TALEN on a unique specific sequence within this 
reporter gene (Figure lA). 

Throughout this study, we used two different TALEN 
scaffolds, namely, a +C40 and a +C1 1-SGSGSGG. In our 
hands, these two scaffolds presented the best balance 
between the possibilities to obtain a very high activity 
while keeping a narrow spacer window to improve the 
specificity. We first design a highly active TALEN 
(further referenced as wt) within the GFP gene (up to 
70% of GFP disruption), where the RVD/nucleotide as- 
sociation fitted the described NLA, HD:C, NN:G and 
NG:T code (6,7). Along with this TALEN, we created 
five collections of additional TALEN that contained, in 
one of the two TAL monomers, alternative (further 
referred as mismatches) RVD/nucleotide pairings. These 
artificial mismatches were located at positions 1-2-3 
(Collection 1), 8-9 (Collection 2), 10-11 (Collection 3), 
12-13-14 (Collection 4) and 14-15 (Collection 5), as 
defined from the first thymine base (Tq) (Figure IB and 
Supplementary Table SI). These 110 TALEN were thus 
tested in our CHO-Kl model ceU line, and the percentage 
of GFP-negative cells was recorded. We first analyzed the 
global effect of mismatch numbers on the activity, taking 
into account all datasets and found a decent correlation, 
although this correlation could mask particular position- 
ing effects (r = -0.68, P = 4e-16, Figure IC). We thus 
decided to further decipher the RVD/nucleotide pairing 
specificity by monitoring the effect of increasing number 
of mismatches (1, 2 or 3) as a function of their position in 
the array (sliding of the experimental window from the 
N- to the C-terminus). The analysis of the data pointed 
out that while the presence of a single mismatch had a 
limited and similar impact regardless its position in the 
array (P = 0.08), multiple mismatches present a more 
pronounced effect when positioned at the N-terminal 
end rather than at the C-terminal end {P = 4e— 4 for 
two mismatches and P = 2e— 4 for three mismatches. 
Figure ID, E and F). A near-complete loss of activity 
was observed when three consecutive mismatches were 
positioned close to the N-terminus, whereas a significant 
activity (up to ~40% of the wt TALEN) was detected for 
mismatches close the C-terminal end (Figure IF). 

Nevertheless, variations in activities were clearly de- 
pendent on multiple parameters, such as the total 
number of mismatches, their position in the array or 
their identity (RVD/nucleotide association). This latter 
parameter was only partially taken into account in our 
experimental setup, as the targeted sequence remained 
constant for all TALEN. We thus envisioned performing 



a comprehensive analysis of the RVD/nucleotide pairing 
focused especially on the N-terminal of the array, as this 
end showed the higher specificity. For such large-scale 
analyses requiring the screening of collections of 
mutants versus collections of targets, our current mamma- 
han cell model system turned out inadequate. To address 
this technical limitation, we used a plasmid-based assay in 
yeast cells to perform high-throughput nuclease activity 
screenings (25). 

To vahdate a single-strand annealing (SSA) assay as 
readout, we performed a comparative nuclease activity 
study of a subset of 13 TALEN from the original GFP 
dataset (Collections 1-5). We found a particularly good 
correlation between results from an extrachromosomal 
SSA assay in CHO-Kl and our previous chromosomal 
disruption experiments (r = 0.88 with P = 3.8e— 05, 
Supplementary Figure SI). In addition, taking into 
account that we have previously reported a very good cor- 
relation between the yeast and CHO-Kl extrachromo- 
somal SSA assays (26), we anticipated that the yeast 
model system could serve as an appropriate and represen- 
tative high-throughput assay to study TALEN activity 
and specificity. 

Experimental setup of the yeast high-throughput nuclease 
activity screening 

The yeast nuclease activity assay, a yeast strain expressing 
the nuclease of interest is crossed with another strain har- 
boring a reporter plasmid containing the target sequence. 
This target sequence is flanked by overlapping truncated 
LacZ genes. On target cleavage, the restoration of the 
LacZ marker through the SSA pathway of recombination 
restores a functional LacZ gene, which can be quantified 
and related to the nuclease efficiency. In addition, to 
minimize bias, amplify potential effects and simplify sub- 
sequent analysis, we performed initial experiments using 
particular homodimeric TALEN architecture. In this 
architecture, the targeted sequence is composed of two 
duphcated sequences in inverse orientation facing each 
other (separated by the so-called sequence spacer) on 
both DNA strands. This setup imphes, inter alia, that a 
specific mismatch (between an RVD and a nucleotide of 
the target) on one-half TALEN arm wiU be found sym- 
metrically in the other half TALEN arm. In addition, as 
most naturally occurring TALE, including our AvrBs3 
scaffold, bind to targets starting with a T (corresponding 
to the so-called Tq), we kept this feature in aU our designs 
of experiments. 

As a first experiment, we determined an optimal 
TALEN repeat-array length to assure an adequate 
dynamic range for activity and sensitivity in our yeast 
screening assay (25). In this experiment, 52 TALEN, con- 
taining 9.5-15.5 repeats, were tested on their cognate 
targets (Figure 2A and Supplementary Table S2). As pre- 
viously observed in other studies (11), no significant cor- 
relation between the size of the array and the nuclease 
activity could be determined (r = 0.27, P = 0.05). 
Second, based on our previous results in CHO-KI 
showing a decrease of specificity from the N- to the 
C-terminal end of the array, we monitored the effect of 
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Figure 1. Gene disruption-relative activities of collection of TALEN at the integrated GFP locus. (A) Schematic representation of the WT GFP 
TALEN on the chromosomal target. (B) Collections of TALE used at the chromosomal GFP locus derived by mutation of the right DNA binding 
domain. X represents any of the four, namely, Nl, HD, NN and NG RVD. Positions are numbered relative to the first thymine of the target (Tq). 
(C) Influence of the number of mismatches on the GFP disruption. Activity ratio between the mismatched and the WT TALEN is represented on a 
boxplot, indicating the median (thick bar), quartiles (box) and extreme values (;• = —0.68, P = 4e— 16). Mismatches are defined relative to the NLA, 
HD:C, NN:G and NG:T codes. The size of the sample is indicated in brackets. (D) Boxplot representation, including the median (thick bar), 
quartiles (box) and extreme values, of the activity ratio between the mismatched and the WT TALEN in function of the collections for one 
mismatch. P = 0.08 (Kruskal-Wallis test). (E) Same as for (D) but for two mismatches. P = 4e— 4. (F) Same as in (D) but for three mismatches. 
P = 2e-4. 
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Figure 2. Design of experimental setup and optimal TALEN array 
length for the nuclease activity screening. (A) Activities of 52 
TALEN pairs measured in vivo, in function of the repeat array 
length. Error bars represent standard errors. (B) Activities of 15 
TALEN pairs on targets randomized in their positions N and N-1. 
Error bars represent standard errors. An analysis of variance demon- 
strates a significant effect of the TALEN length on specificity, position 
N: r = 0.68, P = 4e-6 (Kruskal-Wallis test); position N-1: r = 0.7, 
P = 7e— 6). The yeast activity assay is based on the single-strand an- 
nealing (SSA) pathway used after the creation of a DSB by the TALEN 
in the target sequence. Target sites were designed to allow TALEN use 
in the homodimer format. All TALEN pairs showed a significant 
activity. The number of TALEN/target pairs for each class is indicated 
in brackets. 



mismatches on the last two positions (called N and N-1) of 
the TALEN array as a function of the array length. A 
subset (14 TALEN) of the previous TALEN collection 
was screened for activity on targets containing the four 
possible nucleotides at positions N and N-1. When the 
activity toward the mismatched targets was averaged 
and compared with the activity on the wild-type target, 
a statistically significant correlation between the TAL 
array length and the loss of activity due to mismatches 
at these positions was obtained (position N: r = 0.68, 
P = 4e-6; position N-1: r = 0.7, P = 7e-6, Figure 2B). 
Based on the previous results and data from the literature, 
we further focused on the shorter 9.5-repeats model to 
perform extensive investigation of the specificity of 
RVDs (from positions 1 to 7), as in this configuration 
all RVD/nucleotide pairings are essential for activity. 

Therefore, we performed a comprehensive study of 
TALEN activity and specificity by systematically 
changing DNA-binding modules to create collections of 



TALE arrays (9.5 repeats) containing, at three defined 
consecutive positions, all 64 possible RVD triplets (com- 
binations of HD, NG, NI and NN). These triplets were 
located either at positions 1-2-3 (Collection 6), 3-4-5 
(Collection 7) or 5-6-7 (Collection 8) of the RVD array, 
as defined from the first thymine base (To) (Figure 3A) 
(6,7). To cross-validate our results with respect to differ- 
ences in global affinity, an additional collection located at 
position 1-2-3 (Collection 9) of a longer array (18.5 RVDs) 
was also created. These TALEN collections were assayed 
against their respective 64 targets, containing aU possible 
4-base triplets at the adequate positions. Ahogether, up 
to 4096 TALEN/target combinations per collection were 
assessed for nuclease activity (Figure 3B and 
Supplementary Figures S2-S4). 

Analysis of global nuclease activity as a function 
of RVD identity 

Data gathered from Collection 6 (focusing on position 
1-2-3) clearly highhght that the presence of an adenine 
(A) or a cytosine (C) base at position 1 of the target 
tends to present a deleterious effect on the TALEN 
activity (Supplementary Figure S5A). A similar observa- 
tion was noted for an adenine at position 2, as previously 
hypothesized in another study (27). In this analysis, the 
error bars were relatively high because for each observed 
position we aggregated values derived from targets having 
all possible combinations on the remaining two positions. 
Thereby, we further analyzed the results taking into 
account fixed combinations of nucleotides/RVDs for the 
first two positions. Among all targets tested, the DNA 
sequences containing AAN and CAN (corresponding to 
RVD pairs NI-NI-XX and HD-NI-XX) appeared to be 
statistically the least favorable to achieve a high in vivo 
nuclease activity (difference between AA and others: 
/"-value = 0.005 and difference between CA and others: 
/"-value = 0.002, Student /-test. Supplementary 
Figure S6A). However, we cannot exclude that the 
reduced activity monitored in yeast was due to not only 
multiple parameters involving the binding affinity but also 
protein stability or folding. In addition, these findings 
have to be tempered by the fact that such effects were 
strongly attenuated for the longer array (18.5 repeat) col- 
lection (difference between AA and others: P value = 0.2 
and difference between CA and others: /"-value = 0.04, 
Supplementary Figures S5B and S6B). Notably, 
although monitored on short array collections, no such 
issues were observed when sliding the experimental 
window along the array and target DNA from positions 
1-2-3 to 3-4-5 and then 5-6-7 (Supplementary Figure S5C 
and D). This analysis of the data suggested that the first 
few N -terminal RVD/nucleotide pair may have the 
strongest impact on TALEN activity, consistent with 
our first experiments in CHO-KI and in other studies (28). 

Furthermore, to assess possible context dependence 
at the RVD level, we systematically analyzed the impact, 
on a central RVD, of the neighboring two RVDs 
(positions— 1 and +1). The recovered uniform activity 
levels strongly suggest independence of the central RVD 
from their nearest (right and left)-neighbors 
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Figure 3. Setup of collections used in the large-scale experiments and graphical representation of activity results from Collection 6. (A) Collections of 
TALE and targets used for the study in yeast, where X represents any of the four Nl, HD, NN and NG RVD and N any of the four A, C, G and T 
bases. Collections 6-8 are composed of arrays containing 9.5 repeats and Collection 9 is composed of arrays of 18.5. The TALEN collections were 
used in the homodimer format. (B) Heatmap showing the activity of the 64 TALEN of Collection 6 on the 64 corresponding targets. The outer line 
of text of the target (abscissa) represents the first nucleotide of the NNN triplet, the middle line of text represents the second nucleotide and the 
innermost line of text represents the third nucleotide. Likewise for the RVD array (XXX) on the ordinate. Red corresponds to maximum activity, 
whereas blue corresponds to no activity. The diagonal with framed squares represents the NLA, HD:C, NN:G and NG:T pairings. In the case of a 
perfect one to one RVD/nucleotide association code, activity should be recovered on the diagonal only. 
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(Supplementary Figure S7A and B). These data represent 
an unambiguous indication that no short distance (1 base) 
strong context dependence between RVDs is present in the 
N-terniinal region of TALEN array. 

Analysis of the global effect of mismatches in 
RVD/nucleotide pairs 

We then systematicaUy analyzed the impact on TALEN 
activity of RVD/nucleotide mismatches with respect to the 
described code (NI:A, HD:C, NN:G and NG:T). Despite 
the fact that the NN RVD has been described to target 
both guanine and adenine nucleotides, in this analysis we 
consider as canonical only the NN:G pairing, as nearly all 
researchers are using this RVD uniquely in association 
with a G. When considering TALEN from Collection 6 
that follows the recognition code, 89% of the molecules 
displayed robust activity on their cognate targets 
(Supplementary Table S3). Introducing two, four or six 
mismatches within the TALEN/target pairing led to a 
drop in active molecules to 30, 6 and 1%, respectively 
(Supplementary Table S3). In contrast, increasing the 
repeat array length to 18.5 (Collection 9) allowed 
accommodating up to six mismatches in 16% of the 
tested molecules while still maintaining a robust nuclease 
activity (Supplementary Table S3). The effect of mismatch 
number was position-independent from positions 1 to 7 
with similar decreases observed when the experimental 
window was shifted to positions 3-4-5 (Collection 7) or 
5-6-7 (Collection 8) when compared with Collection 6 
(Supplementary Table S3). 

RVD specificity and prediction of effect of mismatches on 
TALEN activity 

Having observed an absence of context dependence 
between RVDs, we next performed an in-depth analysis 
of our entire dataset to further decipher the individual 
specificity of each RVD. For each position, from 1 to 7, 
we computed an experimental matrix describing the spe- 
cificity of each RVD on the four DNA bases (Figure 4A 
and Supplementary Table S4). Interestingly, these 
matrices appear to be similar (mean standard deviation: 
0.1), indicating the absence of positioning effects on spe- 
cificity. By taking advantage of the conserved specificity of 
RVDs along the TALEN array, we then computed a 
global matrix (or logo) (Figure 4B and C and 
Supplementary Table S2) representing the global specifi- 
city of each RVD of a TALEN. The global matrix 
confirms that the currently used RVD/nucleotide pairing 
code represents the most appropriate solution to generate 
highly active TALEN. However, taken individually each 
RVD can tolerate to varying extents the three other bases, 
although at the expense of a reduction in activity 
(Figure 4B and C). 

We next wanted to determine whether a direct correl- 
ation between the score given by our matrix (based on 
in vivo activity measurements) and the sole protein/DNA 
binding parameter (based on in vitro measurements) could 
be found. Toward this goal, we designed a single TALE 
DNA binding array that targets an endogenous sequence 
of 17 bp (in line with predominant natural-size TALEs) 



within the human IL2RG gene. The corresponding 
protein (lacking the Fokl catalytic domain, 
Supplementary Table S5) was produced as soluble 
protein in E. coli and dissociation constant (Kd) for 
several targets containing various mismatches with 
respect to number (1-3), type and position were 
determined in vitro (24). We only found a moderate cor- 
relation between our scoring and the Kj values (r = —0.54 
with P = 0.022, Supplementary Figure S8 and 
Supplementary Table S5). We believe that this discrepancy 
between the two variables possibly reflects key differences 
between the two experimental setups. Indeed, in vitro ex- 
periments only characterize the intrinsic DNA binding 
properties of a TALE DNA binding domain, whereas 
the in vivo experiments characterized the overall TALEN 
activity that involved not only the direct binding 
properties of the TALE array but also the catalytic 
properties of Fokl catalytic domain (DNA binding 
affinity, dimerization and rate of dsDNA cleavage) and 
DNA repair mechanism (single-strand anneahng or non- 
homologous end joining). Although, our results stressed 
that the binding affinity (TALE/DNA) may be an import- 
ant contributor to the final nuclease output in living ceUs 
(creation and repair of the DSB), we believe that add- 
itional detailed in vitro studies would be desirable to pre- 
cisely decipher the individual contribution of each 
parameter to the final output. One could also hypothesize 
that the in vitro measurements we obtained may better 
correlate with specificity of engineered TALE (where a 
single molecule is involved). 

Finally, we scored, using our global specificity matrix, 
the relative loss of activity due to the mismatches present 
in our collections from the previous CHO-KI experiments 
(Collections 1-5). Based on our previous data (Figure ID, 
E and F), we first divided our collection into two subsets 
of TALEN: (i) Collections 1-3 representing the 
'N-terminal specificity constant' part of the array and 
(ii) Collections 4 and 5 representing the C-terminal part 
of the array with gradual loss of specificity. The use of the 
global specificity matrix allowed accurate prediction of the 
loss of activity for both subsets of TALEN (Collections 
1-3: f = 0.81, P = 4e-16; Collections 4-5: r = 0.84, 
/'=3e— 12, Figure 5A). Furthermore, the variation of 
the slope of the two regressions is in accordance with 
the gradual loss of specificity along the TAL DNA 
binding array we previously observed (Figure IE). 

Evaluation of off-site targeting on sequences containing 
low number of mismatches 

The total number of TALEN sites on the human genome 
was estimated using our criteria (Materials and Methods) 
to be in the order of 500 milHons, making it computation- 
ally intractable to calculate the potential off-target sites 
for every TALEN. Thereby, to statistically evaluate the 
theoretical specificity of TALEN, a set of 15 000 potential 
TALEN target sites (composed of 16 bases each) were 
randomly picked throughout the human genome. AU 
off-site target sequences were then computationally 
determined for each of these 15 000 TALEN sequences. 
As TALEN molecules result from the co-expression of 
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Figure 4. Graphical representation of RVD specificities. (A) Specificity measured for positions 1-7. The level of gray (black = 1, white = 0) repre- 
sents the relative activity compared with the HD:C, NG:T NLA and NN:G RVD/nucleotide pairing code. (B) Average specificity of the four HD, 
NG, NI and NN RVDs on the first 7 positions. (C) Logo representation of the global specificity matrix. Logo was generated using WebLogo (http:// 
weblogo.berkeley.edu/logo.cgi). Values for relative specificities are presented in Supplementary Table S4. 
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Figure 5. Effect of target mutations on TALEN activity in mammalian cells. (A) Correlation between experimental relative activities represented by 
the percentage of GFP-negative cells in the mammalian gene-targeting assay and scoring using the matrix presented in Figure 4B (Collections 1, 2 
and 3 are represented in red: /• = 0.81, P = 4e— 16, and Collections 4 and 5 are represented in blue: r = 0.84, P = 3e— 12). Linear regressions are 
presented for both subsets and 95% confidence intervals are represented by dashed lines. (B) Pie chart representation of the percentage of TALEN 
composed of 15.5 RVDs that will have, in the human genome, potential off-site targets containing no, one, two, three or four and more mismatches 
when considering a test set of 15 000 putative TALEN. All possible combinations of half TALEN (left + right, left + left, right + right) with a spacer 
length ranging from 9 to 30 bp were taken into account. (C) A collection of 33 targets comprising two AvrBs3 target sequences facing each other on 
both DNA strand with spacer length (between the two targets) ranging from 5 to 40 bp were designed and assayed in the CHO-Kl SSA assay to 
determine the optimal cleavage conditions. Targets containing a spacer of 21 and 35 bp were absent from the study. 
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two monomers, we also searched for off-site targets 
derived from the two potential palindromic TALEN 
target sites. Moreover, as the length of the DNA spacer 
between the two TALE recognition sites is known to 
tolerate a degree of flexibility(8-10,29), we included in 
our search any DNA spacer size from 9 to 30 bp. Using 
these criteria, TALEN can be considered extremely 
specific as we found that for nearly two-thirds (64%) of 
those chosen TALEN, the number of RVD/nucleotide 
pairing mismatches had to be increased to four or more 
to find potential off-site targets (Figure 5B). In addition, 
the majority of these off-site targets should have most of 
their mismatches in the first 2/3 of DNA binding array 
(representing the "N-terminal specificity constant" part. 
Figure 1). For instance, when considering off-site targets 
with three mismatches, only 6% had all their mismatches 
after position 10 and may therefore present the highest 
level of off-site processing. Although localization of the 
off-site sequence in the genome (e.g. essential genes) 
should also be carefully taken into consideration, the spe- 
cificity data presented above indicated that most of the 
TALEN should only present low ratio of off-site/in-site 
activities. 

To confirm this hypothesis, we designed six TALEN 
that present at least one potential off-target sequence con- 
taining between one and four mismatches. For each of 
these TALEN, we measured by deep sequencing the fre- 
quency of indel events generated by the non-homologous 
end-joining (NHEJ) repair pathway at the possible DSB 
sites. The percent of indels induced by these TALEN at 
their respective target sites was monitored to range from 1 
to 23.8% (Table 1). We first determined whether such 
events could be detected at alternative endogenous 
off-target site containing four mismatches. Substantial 
off-target processing frequencies (>0.1%) were only 



detected at two loci (OS2-B, 0.4%; and OS3-A, 0.5%, 
Table 1). Noteworthy, as expected from our previous ex- 
periments, the two off-target sites presenting the highest 
processing contained most mismatches in the last third of 
the array (OS2-B, OS3-A, Table 1). Similar trends were 
obtained when considering three mismatches (OSl-A, 
0S4-A and OS6-B, Table 1). Worthwhile is also the ob- 
servation that TALEN could have an unexpectedly low 
activity on off-site targets, even when mismatches were 
mainly positioned at the C-terminal end of the array 
when spacer length was unfavored (e.g. Locus2, OSl-A, 
0S2-A or 0S2-C; Table 1 and Figure 5C). 

Although a larger in vivo data set would be desirable to 
precisely quantify the trends we underlined, taken together 
our data indicate that TALEN can accommodate only a 
relatively small (<3^) number of mismatches relative to 
the currently used code while retaining a significant 
nuclease activity. 

DISCUSSION 

Although TALEs appear to be one of the most promising 
DNA-targeting platforms, as evidenced by the increasing 
number of reports, limited information is currently avail- 
able regarding detailed control of their activity and speci- 
ficity (6,7,16,18,30). In vitro techniques [e.g. SELEX (8) or 
Bind-n-Seq technologies (28)] dedicated to measurement 
of affinity and specificity of such proteins are mainly 
limited to variation in the target sequence, as expression 
and purification of high numbers of proteins still remains 
a major bottleneck. To address these limitations and to 
additionally include the nuclease enzymatic activity par- 
ameter, we used a combination of two in vivo methods to 
analyze the specificity/activity of TALEN. We rehed on 
both, an endogenous integrated reporter system in a 



Table 1. Activities of TALEN on their endogenous cognate target (bold) and potential off-target sequences 



Locus 


Number 


Indels 


Ratio 


Spacer 


Left target (5'-3') 


Right target (5'-3') 


Location 


Indels 


Reads 




of 


(%) 


(%) 


length 








(%) 






mismatches 




off-site/in-site 


(bp) 












LI 


0 


23,8 


100 


15 






chr7: 148,544,235-148,544,283 


424 


1785 


OSl-A 


3 


0 


0 


27 


ttaattgtatattGat 


ttaattAtatattTat 


chr6:67,836,053-67,836,113 


0 


12420 


OSl-B 


4 


0 


0 


10 


ttaattTtatattcat 


tTaTgtaaaggAataa 


chr2:167,063,216-167,063,259 


0 


12081 


OSl-C 


4 


0 


0 


22 


tgaagAaaaggAataa 


ttTattgtatattAat 


chr21:22,558,167-22,558,222 


0 


7223 


OSl-D 


4 


0,01 


0,04 


16 


tTaagtaaaAAtataa 


ttaattTtatattcat 


chrl:80,191,202-80,191,251 


1 


7243 


OSl-E 


4 


0,05 


0,2 


29 


ttaattTtatattcat 


ttTaGtTtatattcat 


chr2:167,063,216-167,063,278 


7 


12902 


L2 


0 


10,7 


100 


15 






chr7: 1 16,335,791-1 16,335,839 


599 


5601 


OS2-A 


4 


0 


0 


20 


tccttcttcGcTgggC 


tccttcttcacaTggt 


chrl9:11.753,683-ll,753,736 


0 


763 


OS2-B 


4 


0,5 


5 


12 


tccttcttcacaAggt 


tcctCcttcacaCgCt 


chr7:l 55,697,642-155,697,687 


14 


2774 


OS2-C 


4 


0 


0 


17 


tccttcttcacTgggA 


tccttcttcacaTggC 


chr9:28,655,171-28,655,221 


0 


1003 


L3 


0 


13 


100 


15 






chrl8:45,423,236^5,423,284 


78 


599 


OS3-A 


4 


0,4 


3 


16 


tttcaactGatAGtag 


tctcaacttatcatag 


clirl2:93,895,410-93,895,459 


4 


933 


L4 


0 


1 


100 


15 






chr7:57,659,395-57,659,443 


42 


4171 


OS4-A 


3 


0,4 


40 


15 


ttctaggaaaccaCct 


tcttaCtaattctGtt 


chrl7: 16,097,940-16,097,988 


12 


3141 


OS4-B 


1 


1 


100 


15 


tcttattaattctatt 


ttctaggaaaccaCct 


chrl7:21, 535,905-21,535,953 


11 


1099 


L5 


0 


17,3 


100 


15 






chrl2:58,145,360-58,145,408 


877 


5081 


OS5-A 


4 


0,05 


0,3 


20 


tcctccaTctcTAcct 


tccAccacctcctcct 


chrll:47,746,108^7,746,161 


1 


2192 


OS5-B 


4 


0 


0 


27 


tcctccTcctcctcct 


tcctTcTTctcctcct 


chr2:67,777,895-67,777,955 


0 


2228 


L6 


0 


6,6 


100 


15 






chr7:26,242,732-26,242,780 


223 


3357 


OS6-A 


4 


0,07 


1 


21 


ttttcATctgtaattt 


ttaTatccTcatattt 


chr5:74,301, 559-74,301,613 


1 


1485 


OS6-B 


3 


0 


0 


21 


ttttcccctgtaattt 


tAacaGTcacatattt 


chrX:71,283, 522-71,283,580 


0 


565 



5400 Nucleic Acids Research, 2014, Vol. 42, No. 8 



CHO-KI mammalian cell line and a plasmid-based 
nuclease activity in yeast. These two approaches had the 
major advantages to be up-scalable to medium or high 
throughput and to minimize or normalize bias from epi- 
genetic variations. To extend our knowledge on activity 
and specificity of RVDs along TALEN arrays, we thus 
analyzed the cleavage profiles of > 15 500 TALEN/target 
combinations, leading to the most exhaustive analysis of 
this new class of DNA-targeting tools available today. 

Although guidehnes based on computational analysis of 
a small subset (20) of natural effectors have previously 
been pubhshed (27), recent studies showed that faihng to 
follow these 'guidehnes' had little or no effect on TALEN 
activity (11,31). Our systematic approach, based on 
investigating windows of RVD triplets along the TALE 
array, reveals the positional effects of unfavorable TALE 
DNA pairings in the N-terminal region of the DNA 
binding array. In agreement with Cermak et al. (27), we 
found that TALE designs should avoid targeting an 
Adenine (A) residue at position 2. Notably, we also 
show that the presence of AA or CA at the first two pos- 
itions of the target sequence leads to a decrease in activity 
and should therefore be removed from TALE hit-search 
engines, especially when designing short arrays (in the 
range size of 9.5-12.5 repeats). Currently, the commonly 
used array lengths (15.5-20.5 repeats) should only be 
weakly impacted by the use of these combinations, 
although a larger data set composed of TALEN of 
common size (e.g. 15.5 repeats) harboring these two par- 
ticular N-terminal RVD compositions would be desirable 
to confirm or infirm the vahdity of these findings for 
current TALEN designs. However, the use of shorter 
arrays could be of interest, as, when rationally and care- 
fully designed (e.g. by the use of our specificity matrices), 
these arrays are more sensitive to mismatches and the 
resulting TALEN should potentially reveal a higher spe- 
cificity as recently described (32,33). 

In a recent large-scale study. Church and coworkers 
aimed to evaluate the landscape of targeting specificity 
of not only the clustered regularly interspaced short pal- 
indromic repeats (CRISPR)-Cas system but also of 
TALEs (32). They found, in accordance to previous 
results, that N-terminal repeats are more sensitive to 
mismatches, a trend that was also evidenced in this 
study for TALEN. They also reported that long arrays 
containing 18 repeats can tolerate up to two or three 
mismatches and shorter arrays (14 and 10 repeats) are 
much less tolerant to mismatches. The total number of 
tolerated mismatches in single TALE was roughly half 
of the one we reported in this study for TALEN. 
However, their study and ours raised a few differences 
between the behaviors of TALE and TALEN. In particu- 
lar, they noticed a decrease in the activity on par with the 
reduced size of the array (18-10 repeats), a characteristic 
that we and others have not observed for TALEN (11). 
This feature might result from the specific architecture of 
TALEN that requires two binding monomers (versus only 
one for the TALE) and the dimerization of the Fokl cata- 
lytic domains. A second observation that was not con- 
firmed in our study using TALEN is the fact that 



mutations in the middle of the array can lead to higher 
activities (32). 

The TALE 'code' currently used by most researchers is 
based on statistical analyses of a limited number of 
natural effectors and target gene promoters leading to 
the analysis of only a fraction of all pairing possibilities. 
In addition, when looking at a naturally occurring pairing 
between TALE and a plant target promoter, most, if not 
all, contain mismatches relative to a perfect one-to-one 
code (NLA, HD:C, NN:G and NG:T). For instance, 
AvrBs3 (the TALE scaffold used in this study) contains, 
when bound to its target, three mismatches (two HD:A 
and one NG:C at positions 1, 15 and 17.5, respectively) 
indicating a certain degree of liberty relative to a perfect 
one-to-one association. In this work, we aimed at 
providing an extensive knowledge of the specificity of 
each TALE binding module (NI, HD, NN and NO) 
along TALEN/DNA binding arrays. Interestingly, 
although individual RVDs can tolerate mismatches, the 
cumulative effects of multiple mismatches within an 
array rapidly out-balances the overall TALE DNA- 
binding affinity. Additionally, our experimental designs 
allowed us to report in particular (i) the absence of short 
distance context dependence between RVDs, (ii) the 
gradual decrease of specificity within the C-terminal half 
array and (iii) a predictive model of RVD specificity 
scoring. 

The combination of our experimental results with a 
large-scale computational analysis of 15 000 randomly 
chosen potential target sites and TALEN indicated that 
two-thirds of the nucleases (composed of two DNA- 
targeting cores of 15.5 repeats) should show a strong pref- 
erence for the designed targets over possible off-site 
sequences, especially when the mismatches are present at 
the N-terminal end of the DNA-targeting core (Figure 5B, 
Table 1). Consistent with this statistical analysis, the ex- 
perimental characterization of off-site mutations on 
putative targeted sites confirmed that TALEN having 
genomic off-site with less than four mismatches should 
be proscribed. The finding that TALEN have such high 
specificity is also coherent with previous studies reporting 
the absence of observed toxicity in HEK293 cells (29,34) 
and undetected or very low off-site targeting frequencies in 
rat (35), stem cells (36), Xenopus embryos (37) and the 
yeast genome (38). However, despite that this cutoff of 
four mismatches should be sufficient for a majority of 
designs; specific applications may require additional 
levels of safety (e.g. higher mismatches cutoff, use of ob- 
Hgatory Fokl heterodimers). As reported by Hockemeyer 
et al., TALEN containing up to nine mismatches (for two 
DNA binding cores of 16.5 repeats) can stiU show indel 
mutations at detectable frequencies (ratio off-site/in-site: 
0.5%) (36). 

Nonetheless, a potential caveat to our findings is that 
we did not take in consideration epigenetic factors that 
will inescapably be present at endogenous loci of most, 
if not aU, organisms. For example, we (23) and others 
(13,31,39) recently reported the sensitivity of TALE- 
based arrays to cytosine methylation, a complexity cur- 
rently not addressed in our experimental setup, that 
should however further decrease off-site targeting. 
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Nevertheless, additional experiments with larger sampling 
that combines epigenetic factors and mismatches would be 
desirable to fully assess the potential of this technology. 
We further envision that the constant accumulation of 
experimental data on TALEN activity and specificity 
will allow, in the near future, to rapidly fulfill these open 
questions. 

Although we believe that our process is a step toward 
improving automated design methods for TALE-based 
molecular tools, additional issues such as the use of rare 
or unnatural RVDs may still be addressed to further 
extend the targeting possibilities. However, we envision 
that, together with the advances in, and access to, 
genome sequencing and epigenetic information, the imple- 
mentation of our experimental model wiU permit a more 
rigorous and educated design of TALE-based tools. 
Beyond off-site prediction, the precise knowledge of alter- 
native RVD/nucleotide pairing opens new possibilities to 
discriminate between sequences for application requiring, 
for example, allele-specific targeting. In conclusion, we 
anticipate that the provided results will expand our engin- 
eering capabihties by increasing our level of experimental 
control of TALE-based molecular tools, notably for ap- 
plications in synthetic biology (22,40,41). 
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